{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "RandomForestClassifier.ipynb",
      "provenance": [],
      "authorship_tag": "ABX9TyNPdbZEEZupLA+l3M1Btpf2",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/maanasvi999/RandomForestClassifier/blob/main/RandomForestClassifier.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Bdd0NPipYiZO"
      },
      "source": [
        "# **Random Forest Classifier**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "REpViv1EYw_i"
      },
      "source": [
        "**What is a Random Forest Classifier?**\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kytjke7nZsAq"
      },
      "source": [
        "> Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of “combining multiple classifiers to solve a complex problem and to improve the performance of the model”.\n",
        "\n",
        "\n",
        "Thus, a random forest is nothing but a collection of **Decision trees**. They are capable of fitting complex data sets while allowing the user to see how a decision was taken. \n",
        "\n",
        "\n",
        "> A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.\n",
        "\n",
        "\n",
        "The Random forest classifier creates a set of decision trees from a randomly selected subset of the training set. It is basically a set of decision trees (DT) from a randomly selected subset of the training set and then it collects the votes from different decision trees to decide the final prediction."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kkuECTa9Zsot"
      },
      "source": [
        "![image.png]()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FHx8tiaId_fp"
      },
      "source": [
        "Now let's look at a practical implementation:\n",
        "\n",
        "**Random Forest Classifier for Smart Crops Recommendation System**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "phNrtWhkWCgp"
      },
      "source": [
        "import pandas as pd\n",
        "import numpy as np"
      ],
      "execution_count": 1,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "bZQsTBkMWrlQ",
        "outputId": "8f537fd0-59ff-452a-cfe1-b655abd10c4e"
      },
      "source": [
        "df_data = pd.read_csv('dataset.csv')\n",
        "df_data.head()"
      ],
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>N</th>\n",
              "      <th>P</th>\n",
              "      <th>K</th>\n",
              "      <th>temperature</th>\n",
              "      <th>humidity</th>\n",
              "      <th>ph</th>\n",
              "      <th>rainfall</th>\n",
              "      <th>label</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>73</td>\n",
              "      <td>57</td>\n",
              "      <td>44</td>\n",
              "      <td>20.879744</td>\n",
              "      <td>82.002744</td>\n",
              "      <td>6.502985</td>\n",
              "      <td>202.935536</td>\n",
              "      <td>rice</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>74</td>\n",
              "      <td>57</td>\n",
              "      <td>44</td>\n",
              "      <td>21.770462</td>\n",
              "      <td>80.319644</td>\n",
              "      <td>7.038096</td>\n",
              "      <td>226.655537</td>\n",
              "      <td>rice</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>92</td>\n",
              "      <td>41</td>\n",
              "      <td>38</td>\n",
              "      <td>23.004459</td>\n",
              "      <td>82.320763</td>\n",
              "      <td>7.840207</td>\n",
              "      <td>263.964248</td>\n",
              "      <td>rice</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>68</td>\n",
              "      <td>44</td>\n",
              "      <td>38</td>\n",
              "      <td>26.491096</td>\n",
              "      <td>80.158363</td>\n",
              "      <td>6.980401</td>\n",
              "      <td>242.864034</td>\n",
              "      <td>rice</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>66</td>\n",
              "      <td>36</td>\n",
              "      <td>40</td>\n",
              "      <td>20.130175</td>\n",
              "      <td>81.604873</td>\n",
              "      <td>7.628473</td>\n",
              "      <td>262.717340</td>\n",
              "      <td>rice</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "    N   P   K  temperature   humidity        ph    rainfall label\n",
              "0  73  57  44    20.879744  82.002744  6.502985  202.935536  rice\n",
              "1  74  57  44    21.770462  80.319644  7.038096  226.655537  rice\n",
              "2  92  41  38    23.004459  82.320763  7.840207  263.964248  rice\n",
              "3  68  44  38    26.491096  80.158363  6.980401  242.864034  rice\n",
              "4  66  36  40    20.130175  81.604873  7.628473  262.717340  rice"
            ]
          },
          "metadata": {},
          "execution_count": 2
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 560
        },
        "id": "fns0a1a_W3BV",
        "outputId": "4714f5f4-58f7-481a-de23-c450b4487419"
      },
      "source": [
        "#To get info about the data set\n",
        "df_data.info()\n",
        "df_data.describe()"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "<class 'pandas.core.frame.DataFrame'>\n",
            "RangeIndex: 2000 entries, 0 to 1999\n",
            "Data columns (total 8 columns):\n",
            " #   Column       Non-Null Count  Dtype  \n",
            "---  ------       --------------  -----  \n",
            " 0   N            2000 non-null   int64  \n",
            " 1   P            2000 non-null   int64  \n",
            " 2   K            2000 non-null   int64  \n",
            " 3   temperature  2000 non-null   float64\n",
            " 4   humidity     2000 non-null   float64\n",
            " 5   ph           2000 non-null   float64\n",
            " 6   rainfall     2000 non-null   float64\n",
            " 7   label        2000 non-null   object \n",
            "dtypes: float64(4), int64(3), object(1)\n",
            "memory usage: 125.1+ KB\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>N</th>\n",
              "      <th>P</th>\n",
              "      <th>K</th>\n",
              "      <th>temperature</th>\n",
              "      <th>humidity</th>\n",
              "      <th>ph</th>\n",
              "      <th>rainfall</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>2000.000000</td>\n",
              "      <td>2000.000000</td>\n",
              "      <td>2000.000000</td>\n",
              "      <td>2000.000000</td>\n",
              "      <td>2000.000000</td>\n",
              "      <td>2000.000000</td>\n",
              "      <td>2000.000000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>mean</th>\n",
              "      <td>52.859500</td>\n",
              "      <td>52.844000</td>\n",
              "      <td>51.002000</td>\n",
              "      <td>25.364491</td>\n",
              "      <td>71.951877</td>\n",
              "      <td>6.490521</td>\n",
              "      <td>103.916963</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>std</th>\n",
              "      <td>37.647462</td>\n",
              "      <td>33.992991</td>\n",
              "      <td>52.299882</td>\n",
              "      <td>5.083414</td>\n",
              "      <td>22.392495</td>\n",
              "      <td>0.770847</td>\n",
              "      <td>54.850112</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>min</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>5.000000</td>\n",
              "      <td>5.000000</td>\n",
              "      <td>8.825675</td>\n",
              "      <td>14.258040</td>\n",
              "      <td>3.504752</td>\n",
              "      <td>20.211267</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25%</th>\n",
              "      <td>21.000000</td>\n",
              "      <td>26.000000</td>\n",
              "      <td>22.000000</td>\n",
              "      <td>22.602437</td>\n",
              "      <td>61.251404</td>\n",
              "      <td>5.980442</td>\n",
              "      <td>66.450605</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>50%</th>\n",
              "      <td>39.000000</td>\n",
              "      <td>49.000000</td>\n",
              "      <td>35.000000</td>\n",
              "      <td>25.323563</td>\n",
              "      <td>80.544357</td>\n",
              "      <td>6.422961</td>\n",
              "      <td>95.246217</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>75%</th>\n",
              "      <td>87.000000</td>\n",
              "      <td>67.000000</td>\n",
              "      <td>50.000000</td>\n",
              "      <td>28.085168</td>\n",
              "      <td>90.467104</td>\n",
              "      <td>6.929842</td>\n",
              "      <td>120.997164</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>max</th>\n",
              "      <td>140.000000</td>\n",
              "      <td>145.000000</td>\n",
              "      <td>205.000000</td>\n",
              "      <td>43.675493</td>\n",
              "      <td>99.981876</td>\n",
              "      <td>9.935091</td>\n",
              "      <td>298.560117</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                 N            P  ...           ph     rainfall\n",
              "count  2000.000000  2000.000000  ...  2000.000000  2000.000000\n",
              "mean     52.859500    52.844000  ...     6.490521   103.916963\n",
              "std      37.647462    33.992991  ...     0.770847    54.850112\n",
              "min       0.000000     5.000000  ...     3.504752    20.211267\n",
              "25%      21.000000    26.000000  ...     5.980442    66.450605\n",
              "50%      39.000000    49.000000  ...     6.422961    95.246217\n",
              "75%      87.000000    67.000000  ...     6.929842   120.997164\n",
              "max     140.000000   145.000000  ...     9.935091   298.560117\n",
              "\n",
              "[8 rows x 7 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bUYAbydmXcjP"
      },
      "source": [
        "X = df_data[['N', 'P', 'K','humidity','temperature','ph','rainfall']]\n",
        "y = df_data['label']"
      ],
      "execution_count": 4,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YrJ0Ces3XdRj"
      },
      "source": [
        "#Import from sklearn.model_selection\n",
        "from sklearn.model_selection import train_test_split"
      ],
      "execution_count": 5,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "R5tWh8qJXdUL"
      },
      "source": [
        "# dividing X, y into train and test data\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)"
      ],
      "execution_count": 6,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f_-XljAelG0_"
      },
      "source": [
        "Add details about n_estimator and other fields inside "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "VQq1POw6XdXh"
      },
      "source": [
        "#Random Forest Classifier\n",
        "from sklearn.ensemble import RandomForestClassifier\n",
        "clf = RandomForestClassifier(n_estimators=100)"
      ],
      "execution_count": 7,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RmufUjZ2XuL1"
      },
      "source": [
        "#Train the model using the training sets\n",
        "clf.fit(X_train,y_train)\n",
        "y_pred = clf.predict(X_test)"
      ],
      "execution_count": 8,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "c1HyL4wgXwwt",
        "outputId": "d7a4ae88-d478-4e3e-999f-e5f3288dac92"
      },
      "source": [
        "# Model Accuracy\n",
        "from sklearn import metrics\n",
        "print(\"Accuracy:\", metrics.accuracy_score(y_test, y_pred))"
      ],
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Accuracy: 0.9966666666666667\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0Jk9K-IKa1Wz"
      },
      "source": [
        "Now we can use this Random Forest Classifier model developed to create an API to connect it to a form to get the Best Crop Recommended for given input values of NPK values, Temperature and Humidity settings, pH and Rainfall levels."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b-iUus38cF4O"
      },
      "source": [
        "The Form and Output would look something like this!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rh4JYX4ScJyX"
      },
      "source": [
        "![image.png]()"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "eoBicw3hO2t6",
        "outputId": "91a2f468-6cf5-42aa-8495-f729ec0bf7c9"
      },
      "source": [
        "data = np.array([[104,18, 30, 23.603016, 60.3, 6.7, 140.91]])\n",
        "prediction = clf.predict(data)\n",
        "print(prediction)"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "['coffee']\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cp9NFk2rbQ_z"
      },
      "source": [
        "> Sure Decision Tree Classifier can be used, but if you remember the definition of Random Forest Classifier, it shows which algorithm gives a better accuracy!\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dV4zvB4gUCCj"
      },
      "source": [
        "Let's practically look at the difference in the accuracies"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KJdSxxXuT6Ps"
      },
      "source": [
        "#Decision Tree Classifier\n",
        "from sklearn.tree import DecisionTreeClassifier\n",
        "dfc = DecisionTreeClassifier()\n",
        "dfc.fit(X_train, y_train)\n",
        "y_pred = dfc.predict(X_test)"
      ],
      "execution_count": 11,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "lyZQ-ECvUjO2",
        "outputId": "31044c46-d4eb-4a67-8cae-f7f111f522d8"
      },
      "source": [
        "# Model Accuracy\n",
        "from sklearn import metrics\n",
        "print(\"Accuracy:\", metrics.accuracy_score(y_test, y_pred))"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Accuracy: 0.99\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Kzqle0juU_Pn"
      },
      "source": [
        "> As it can be seen practically, Random Forest has better classification as compared to Decision Trees.\n",
        "\n",
        "P.S. There are various other Classification Algorithms available, but I have chosen to use and compare these two.\n",
        "\n",
        "\n",
        "\n",
        "\n"
      ]
    }
  ]
}
